How Statistical Information from the Web can Help Identify Named Entities
نویسنده
چکیده
This paper presents a Natural Language Processing (NLP) approach to filter Named Entities (NE) from a list of collocation candidates. The NE are defined as the names of ’People’, ’Places’, ’Organizations’, ’Software’, ’Illnesses’, and so forth. The proposed method is based on statistical measures associated with Web resources to identify NE. Our method has three stages: (1) Building artificial prepositional collocations from Noun-Noun candidates; (2) Measuring the ”relevance” of the resulting prepositional collocations using statistical methods (Web Mining); (3) Selecting prepositional collocations. The evaluation of Noun-Noun collocations from French and English corpora confirmed the relevance of our system.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملUsing Wikipedia's Category Structure for Entity Search
In this paper we investigate how the category structure of Wikipedia can be exploited for Entity Ranking. In the last decade, the Web has not only grown in size, but also changed its character, due to collaborative content creation and an increasing amount of structure. Current Search Engines find Web pages rather than information or knowledge, and leave it to the searchers to locate the sought...
متن کاملExtracting Semantic Networks among Named Entities from Websites
To enable machine processing of webpages, it is important to identify the relationships among named entities. Named entities, like, people, organizations, and places are important pieces of information that must be extracted. The scale of the web indicates that manual extraction is not feasible. We propose a system that automatically constructs a semantic network of named entities from webpages...
متن کاملHigh Performance Clustering for Web Person Name Disambiguation Using Topic Capturing
Searching for named entities is a common task on the web. Among different named entities, person names are among the most frequently searched terms. However, many people can share the same name and the current search engines are not designed to identify a specific entity, or a namesake. One possible solution is to identify a namesake through clustering webpages for different namesakes. In this ...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کامل